added rate limiting #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

bradley-erickson wants to merge 2 commits into berickson/202504-process-metrics from berickson/202504-llm-limiting

Collaborator

bradley-erickson commented Apr 18, 2025 •

edited

Loading

Added a generic rate limiting decorator.
TODO

Pull rate limits from settings
Figure out best way to handle this on the dashboards (what do we say)


          added rate limiting

72e9d99

bradley-erickson requested a review from pmitros

April 18, 2025 12:24

Collaborator Author

bradley-erickson commented Apr 18, 2025

This is merging into a branch (dashboard updates) right now since it was easier for my environment to work on this way. Once that other branch gets merged in, this could get merged into master.

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py Outdated

+                  async def check_rate_limit(user_id):
+                      '''Reusable rate limiter with service-specific settings'''
+                      # TODO fetch from pmss/define appropriate window
+                      max_requests = 2

Contributor

pmitros Apr 22, 2025

Something more descriptive.

max_user_requests_per_window

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py Outdated



		def create_rate_limiter(service_name):
		'''Factory function for rate limiters with closure over service name'''

Contributor

pmitros Apr 22, 2025

That repeats the function name. I would describe what the rate limiter does (e.g. max # of requests per window). Basically, autogen docs should tell me what this does.

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py Outdated

+                          now = time.time()
+                          # Initialize user/service tracking
+                          key = f'rate_limit:{service_name}:{user_id}'

Contributor

pmitros Apr 22, 2025

limiter_key

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py Outdated

+                              RATE_LIMITERS[key] = collections.deque()
+                          # Expire old requests
+                          timestamps = RATE_LIMITERS[key]

Contributor

pmitros Apr 22, 2025

request_timestamps, maybe? or prior_requests_timestamps?

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py



		def rate_limited(service_name):
		'''Decorator for async functions needing rate limiting'''

Contributor

pmitros Apr 23, 2025

This should clearly document what kinds of functions this can wrap, and the protocol. Things like this:

            if 'runtime' not in kwargs:
                raise TypeError(f'`{func.__name__}` requires `runtime` keyword argument for checking rate limits.')

Should be in the docstring. Basically, I want to know how I use this.

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py Outdated


		runtime = kwargs['runtime']

		check_limit = create_rate_limiter(service_name)

Contributor

pmitros Apr 23, 2025

check_rate_limit would be slightly nicer.

pmitros reviewed

View reviewed changes

learning_observer/learning_observer/rate_limiting.py

+                          user = await learning_observer.auth.get_active_user(request)
+                          user_id = user[learning_observer.constants.USER_ID]
+                          if not await check_limit(user_id):
+                              raise PermissionError(f'Rate limit exceeded for {service_name} service')

Contributor

pmitros Apr 23, 2025

I'm actually a bit confused when I'd want this behavior. It'd be helpful to know when this is used.

Seems like what I want:

Queue up requests
If they go obsolete (e.g. user navigates away), drop them
If they don't, let the user know we're throttling and throttle them

Simply failing seems like it might be annoying to the user.

Contributor

pmitros commented Apr 23, 2025

I did a review. The code is fine on a code level, but I'm concerned on an algorithm level. I think what we want is:

Have a rate limit.
Keep a finite number of requests in parallel. A lot of this is for LLMs, where we might want e.g. a maximum of 5 parallel calls to OpenAI
We want some overall throttle (e.g. 30 requests per minute)
If users go over that, simply let them know it will take a while since they're over the rate limit; run when they're dethrottled
If they navigate away or e.g. submit many times, drop obsolete requests in-queue which haven't gone out yet.


          some pr feedback, still need to address more

cc3763a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet